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Abstract 



We introduce a novel local time- stepping technique for marching-in-time algorithms. 

The technique is denoted as Causal-Path Local Time-Stepping (CPLTS) and it is ap- 
^Q, plied for two time integration techniques: fourth order low-storage explicit Runge-Kutta 

(LSERK4) and second order Leapfrog (LF2). The CPLTS method is applied to evolve 

Maxwell's curl equations using a Discontinuous Galerkin (DC) scheme for the spatial 

discretization. 
O Numerical results for LF2 and LSERK4 are compared with analytical solutions and 

the Montseny's LF2 technique. The results show that the CPLTS technique improves 



Y^ the dispersive and dissipative properties of LF2-LTS scheme. 

c/^ Keywords: Causal Path, Local Time-Stepping, LTS, Discontinuous Galerkin Methods, 

^ Maxweh's equations, DGTD 

a. 



1. Introduction 



Many time-stepping algorithms have been proposed in order to improve the perfor- 
mance of Discontinuous Galerkin (DG) based schemes by increasing the maximum time 
^^ step while preserving stability. There are usually two kinds of strategies used for this 

^-H purpose: to use implicit schemes [1, 2] or, to use a explicit local time-stepping (LTS) 

<^il technique [3, 4, 5, 2, 6, 7, 8, 9]. The advantage of LTS schemes versus implicit strategies 

^^ is that the former can be used recursively and easily paralellized. This leads to an im- 

CO provement in performance independent of the relative size distribution of the elements 

^"^ in the mesh. Additionally, time integration algorithms may have other constraints on 

^ the time-step arising from accuracy considerations and other inherent time scales such 

as in dispersive media [10] or when hybridized with network/lumped elements models 
[11], LTS techniques can also contribute to mitigate these problems in a simple and 
C^ straightforward way. 

When a second order convergent spatial discretization is used, the most commonly 
used time integration method is the second-order leapfrog (LF2) algorithm. Several 
authors [4, 5] use a LF2-LTS scheme proposed by Montseny [6] consisting on using the 
last known values of the fields on the larger time stepped region each time that the 
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smaller one needs a field value. Piperno [2] adopts a similar approach based on a Verlet 
scheme. Alvarez [3, 12, 13, 14, 15] contributed with a novel approach to perform LTS in 
LF2 schemes whereby an interpolation between the fields is used in an interface between 
the larger and smaller time-stepped regions. A rigorous demonstration of the stability 
and dispersive properties of these schemes is still an open problem. 

Diaz and Grote [7, 8] implemented a rigorous study on the stability and dispersion of 
LF-LTS high-order schemes applied to the second-order wave equation by means of an 
eigenvalue analysis. They found that the LTS introduces numerical dispersion and can 
produce instabilities if the global time step is not slightly reduced with respect to a classic 
implementation. The authors also found that the global stability could be improved by 
enlarging the smaller time-stepped region. 

For higher order methods, explicit Runge-Kutta (RK) algorithms [16, 17, 18, 19, 20, 
21] seem to be preferred with respect to LF schemes [22]. Despite of their popularity, there 
are less works in the literature related to RK-LTS than to LF-LTS. For Maxwell's equa- 
tions RK-LTS algorithms usually rely on interpolations at the interfaces using previously 
computed solutions [16] or arbitrary high-order derivatives (ADER) schemes [9, 23, 24]. 

In this paper we present a novel LTS technique that can be applied to a large va- 
riety of time integration algorithms. It does not need interpolation between computed 
solutions and nor directly uses any previously known values. Numerical results show- 
ing comparisons with analytical solutions for applications on a second-order Leap-Frog 
(LF2) and on a fourth-order Low Storage Explicit Runge-Kutta scheme (LSERK4) are 
shown to demonstrate the advantages of the proposed LTS technique. 

2. Discontinuous Galerkin Semidiscretization 

Maxwell's curl equations for source-less homogeneous media can be written as 

V X ^ = -^dtH 

W xH = edtE (1) 

For simplicity, in our discussion we will assume that e and fi do not vary in the compu- 
tational domain, and use a system of units where e = jj. = 1. 

We tessellate the computational domain with /c = 1, . . . , i^ non-overlapping tetrahe- 
drons. In each of those, we apply the Discontinuous Galerkin's formalism [16, 18, 19] to 
obtain 

MkdtEkit) + SkUkit) -Y,^kfiilf{t) = 

/ 

MkdtHkit) ^ Sk^kit) -Y^^kfKfit) = (2) 

/ 

With A^ being the mass matrix, S the spatial semidiscretization of the curl operator and 
J^f the lift operator for face /. E and H are column vectors containing all the degrees 
of freedom for the electric and magnetic field respectively. E* and H* are the numerical 
fiuxes. 



We define a state vector q/^ = [E/^ Hk]'^ containing all the Nj. degrees of freedom 
of element k. With this definition, we can rewrite system (2) as a single equation that 
governs the time evolution of the system, 

dtc^,{t) = -{Mir' hl^kit) - E-^fe/ i^kfCLkit) - 4/+qfe/+(t)) J (3) 

The DG method gives us some freedom in the selection of the operators Skf and Ekf^ as 
long as it respects the properties of consistency, continuity, and monotonicity needed for 
the numerical fiux [25] . If this operator is block diagonal with all its components being 
1/2, we will say that the semi-discrete scheme is using a centered fiux and therefore is 
numerically non-dissipative [26, 3]. On the other hand, if these operators are non-block 
diagonal we will say that the fiux is being penalized and therefore the semi-discrete 
scheme is numerically dissipative. We will mostly focus on a particular case of penalized 
fiux: the upwind fiux [18, 27], coming from the solution of the Riemann problem. 

When using penalized fiuxes some dissipation is introduced and more operations are 
needed to compute the fiux terms. However, introducing such penalization is known 
to improve numerical dispersion and suppress spurious modes [6, 18, 3, 16, 28]. Thus, 
the penalized fiuxes approach is usually preferred for simulations not requiring very long 
integration times. 

To simplify the discussion further we will change the basis of the vector space using 
an invertible operator Vk on equation (3) that diagonalizes only the locally applied 
operators, 

W, = -Vk\Ml)-\Sl -Y.TlfSkf)Vk (4) 

/ 
We can also define the eigenmodes as 

Pfe =Vj;^qk (5) 

and the external operators as 

Vfc/ = -ri:\M%':Fif£kuVk (6) 

This change of basis let us write equation (3) in the following compact form 

dtPk{t) = WkPkit) -^Y^^kfPkuit) (7) 

/ 

3. Time integration 

In the following discussion, we will focus on two time integration methods that are 
also the most popular choices in conjunction with DG semidiscretizations. 

3. 1 . Seco nd- order Leap - Frog (LF2) 

The second-order leap-frog method [29] is applied by alternately evolving the E"^ and 
jjn+i/2 fig}(^g^ arbitrarily defined at times tn and tn + At/2 respectively. This implies 

3 



that we do not have a fuhy defined state vector in the sense of eq. (3) for a given time 
t. To obtain the future values from a present state the fohowing algorithm is applied 



Hn+3/2 ^ jj^+1/2 ^ ^^ ^^ f^n^ H^+1/2A (3) 



With Ch being a function representing the result of applying the spatial semi-discretization. 
When centered fiuxes are used, the operator jC^ only uses H"^+^/^ or E"^ as arguments. 
This implies that the scheme is reversible in time and will preserve energy as long as the 
time step used is below a maximum value At^ set by a CFL-like condition [1, 2, 29]. 

3.2. Low-Storage Explicit Runge-Kutta (LSERK4) 

The second method that we will use in our discussion is the five-stage fourth-order 
Explicit Runge-Kutta method (LSERK4) [16, 21]. This method states that for a given 
vector representing the state of the system, i.e. Pk{t) = p^ we can find an approximate 
solution state pk {t -\- At) = p^^ applying the following algorithm 

P/e — P/C5 



r(^) = a,r(-^) + At W,pt'^ + E^^/P- 



f 

p^'' = p^i' (9) 

with i G [1,...,5] and the coefficients a^, bi and q taking the values indicated in Table 
1. The LSERK4 scheme is one of the most used methods in high-order Discontinu- 
ous Galerkin semi-discretizations, because it introduces low dispersion and dissipation. 
Contrary to other RK implementations, the low-storage version requires the storage of 
only two times the number of degrees of freedom in the scheme at the expense of one 
additional stage. RK methods are constrained by the spectra of the operator W/c, i.e. 
all the eigenvalues of Wk must lie inside of the stability region of the RK scheme. Con- 
sequently, the time step must be chosen sufficiently small, e.g. for a nodal basis the 
following inequality must hold [16] 

Atk<-min^ (10) 

where min^ Ar^i indicates the minimum distance between nodes in element k and c^ is 
the maximum speed of light in the element k. 

Despite its many advantages, LSERK4 has a high computational cost and the nu- 
merical dissipation it introduces can be a factor depending on the application. 

4. The Causal-Path LTS technique 

In this section we introduce the Causal-Path technique as a novel way of performing 
LTS in different time integration techniques. We require two basic properties for the 
time integration technique: 
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cti 



Ci 



567301805773 
1357537059087 

2404267990393 
2016746695238 

-3550918686646 
2091501179385 

1275806237668 
842570457699 



1432997174477 
9575080441755 

5161836677717 
13612068292357 

1720146321549 
2090206949498 

3134564353537 

4481467310338 

2277821191437 
14882151754819 



13612068292357 
9575080441755 



22526269341429 
6820363962896 



2006345519317 
3224310063776 



28032321613138 
2924317926251 



Table 1: Coefficients for tiie low-storage five-stage fourth-order Explicit Runge-Kutta 
method (LSERK4) 



1. It has to provide a fully defined state q_k{t) for each element. 

2. The next state q/e(t + At) can be explicitly computed from a neighbourhood of 
elements. 

As a first step we will organize the elements in different groups, called tiers, according 
to their time steps denoted as At^. An element k will belong to a tier m = [0, . . . , Nm — l] 
if its maximum time step Atk is such that 



At^ < Atk < At^+^ 



(11) 



In order to compute the next time step, we need to use the field values at local and 
neighbor elements, p^^'*~ and P^T'*~ • If there is no connection with other elements 
belonging to a lower tier, we can evolve all the elements in m using their At^. However, 
in the border between a tier m and m-\-l we can not apply the direct algorithm because 

(7n,i— 1) 



P^^+i has not been computed. 



the value p^Jf 

The strategy that we propose is to compute the values p^^+i using At^~^ = 
hi-iAt^ as time step wherever they are necessary. If to do that, we need additional 
neighbour values that have not been computed, and we recursively apply this idea until 
a known value is found. Thus, starting from m = we can compute all the stages needed 
to evolve it before starting with the tier m = 1 and so on. Finally, the values p^^+i 
are casted aside and the upper tiers uses the original values from the lower tier. 

To compute the next time step values in each of the Nm tiers we may need to compute 
Ns stages in all the elements of tier m. We will also need to compute intermediate stages 
between the stages in the m -\- 1 tier. So, in order to avoid a possible interleaving with 
other higher tiers, we impose that the {Ng — l)-depth neighbourhood of a tier m is only 
composed of elements belonging to tier m-\-l or m — 1. This additional condition for the 
tier assortment is illustrated in Figure 1. 

The implementation of this algorithm may seem difficult at a first glance; however, the 
recursive nature of the algorithm allows us to make use of recursive calls to the function 
used to evolve the system. Every time the function is called, we pass the information 



about the tier in which this is being computed and the time step that has to be apphed. 
So starting from a cah to evolve the N^^ tier for a given time step At, the function wih 
recursively call itself on each of the stages of the algorithm passing Nm — 1 and hi At 
as arguments and evolving its corresponding tier elements. This technique also requires 
that the degrees of freedom in the region being interfaced are saved in the higher tier. 
Note that no interpolation of field values is necessary and only past field values generated 
by the discretization itself are utilized. 

In the next sections we describe two examples of the CPLTS technique, applied to 
the LF2 and LSERK4 algorithms, together with illustrations to clarify the concepts. 




Figure 1: This figure illustrates the concept of 4-depth neigbourhood of two different 
regions. The darker colors indicate elements belonging to a lower temporal tier and thus 
have a smaller time step. 



4.1. LF2- CPLTS 

Since the LF2 performs iterations using a single stage we can create any distribution 
of Ns intermediate stages in the higher tiers to fit the evaluations needed by the smaller 
tiers. The time-steps of the intermediate stages would then be /i^At^+^'* = At^, with 
hi > and the restriction ^. "" hi = 1. The choice of hi = l/Ng would be the most 
favourable in terms of computational cost. Figure 2 shows an schematic view of this 
scheme applied to the case hi — /12 = 1/2. Note that this freedom in choosing hi is 
an improvement compared with the Montseny's scheme [6], which is constrained by the 
condition At^ = At^~^(l + 2/c). This is also an improvement with respect to the Verlet- 
Piperno's scheme [2] in which At^+^ = 2At^, and it allows our scheme to adapt to the 
different transitions as necessary; however, for the sake of simplicity we will not consider 
these cases here. 

On the other hand, we need both values of E and H at same time instants in order 
to find a fully-defined state of the system at any given stage p(^'*). In other words, 
we can't apply this LTS technique computing only E(tn) and H(tn + At/2) because to 
compute the intermediate value of a lower tier, let's say E^~^(tn + At"^/2) we would 
need the values of the magnetic field H^(tn). To overcome this issue we need to apply 
LF2 twice, doubling the computational costs with respect to the conventional approach. 
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When we apply this scheme to a non-dissipative semi-discretization (e.g. DG with 
centered flux) we find that the scheme is unstable showing growing high-frequency nu- 
merical modes. 

The introduction of a penalized fiux solves this problem through higher frequency 
damping [17, 26]. 
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Figure 2: Schematic view of the LF2-CPLTS algorithm particularized to the case hi = 
/i2 = 1/2. Vertical lines indicate boundaries between elements. Dashed horizontal lines 
are intermediate stages. Continuous horizontal lines indicate time steps. 



4.2. LSERK4-CPLTS 

When the CPLTS technique is applied to an LSERK4 we note that the stages are not 
evenly distributed in time. As a result, we apply a variable time step in the lower tiers 
(Figure 3). The values for Af^ and At^~^ are chosen such that equation (10) is always 
enforced and therefore 

max(/iOAt^ = At^-^ (12) 

with m.8iKi{hi) being the maximum stage size (for LSERK4 max^(/i^) = /i4 = C5 — C4 = 
0.336026 ~ 1/3). Whenever we compute intermediate stages in higher tiers we satisfy 
this condition because in higher tiers this condition is less restrictive. However, every 
time we apply this division, Ns times more computational operations are needed to get 
a speed-up of about three times in the higher tier region. So, if the largest tier region 
is not at least 5/3 times larger than the smallest we won't see any appreciable global 
speed-up. 

For this reason it seems preferable to organize the time tiers with At^~^ = Af^ /Ns 
rather than with the maximum stage size criteria. By doing this, we are computing an 
stage in the lower tier region with a time-step bigger than is strictly allowed based on 
a conventional CFL-like criterion for the associated direct algorithm, which could be a 
source of potential instability. On the other hand, the smaller stages in the lower tier 
compute the solution using a time-step smaller than the maximum allowed and thus 
introducing an additional numerical dissipation. We may then wonder if the additional 
dissipation introduced by the smaller stages offsets the potential for instability intro- 
duced by the larger. Note that as long as these effects are mostly kept limited to high 
frequency components (which are under-resolved anyway) the solution accuracy should 
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not be impacted. In the next sections we perform some tests to assess the practical 
vaUdity of this approach. 
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Figure 3: LSERK4-CPLTS sketch. Additional operations are made only in the 4-depth 
neighbourhood of the smaller tier region (darker). In this sketch At^~^ = h^At^. 



5. Numerical Results 

In this section we present comparisons between results using the proosed CPLTS 
technique, the LF2-LTS technique introduced by Montseny [6], classical implementations 
of the algorithms, and analytical solutions. 

For all cases we use nodal basis of order P = 2 and numerical upwind fluxes as 
described in [16, 18, 19, 30]. This implies that we are using 60 degrees of freedom 
per element. The implementation has been performed with an in-house C++ code^ 
with OpenMP parallelization^. GiD was used to obtain meshes and for pre and post- 
processing^. Simulations for the reflection and resonance problems were performed using 
a single processor laptop with Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz processor 
and running Ubuntu 12.04 LTS. The RCS problem were run in a desktop computer with 
an Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz processor with 12 cores and Ubuntu 
10.04 LTS. 

5.1. Reflection caused by a non-homogeneous mesh 

The first example we present is an study of the numerical reflection caused by dif- 
ferences in the mesh size, a similar type of analysis can be found in [31, 32]. This type 



1 Compiled with GNU C++ v4.6.3 using -03 -ffast-math flags 
■^For more information visit: http://www.ugrfdtd.es 
^For more information visit: http://www.gidhome.com 



of analysis is important for LTS because it quantifies a source of additive noise on the 
results. Figure 4 shows the meshes used, together with an isometric view of the boundary 
conditions employed. A plane wave excitation with z-polarization is introduced in one of 
the ends of the computational domain and the other end is backed by an Silver- Mueller 
absorbing (SMA) boundary condition. The side-walls of the domain are Perfect Electric 
Conducting (PEC) and Perfect Magnetic Conducting (PMC) boundary conditions at the 
xy and xz planes respectively. The mesh is 1 m long from one end to the other. The 
coarse cell size is 7.5 cm and the cell sizes in the finer region vary from 0.1 to 0.5 cm. 

Figures 5, 6 and 7 show the refiection coefficient in a range of frequencies. The 
closer the values are to zero the better are the properties of the scheme. We observe 
that for this case the LF2 with a fully defined state (LF2full) exhibits slightly better 
properties than the classic LF2 scheme. A possible explanation for this is that the incident 
wave is resolved using more time steps. In LF2-LTS and LF2-CPLTS, we observe some 
additional degradation when compared to the classic LF2 schemes. The CPLTS exhibits 
less reflection than the Montseny's LTS, the difference growing with the ratio between 
the coarser and flner mesh. The three LSERK4 flgures exhibit a better behaviour than 
the LF2, as expected due to the higher order of the time integration technique. When 
the maximum stage is used for the tier assortment, we observe a higher degradation in 
the low-frequency regime, probably because more time-stepping operations are being 
performed. The results for the LSERK4-CPLTS are very encouraging as we see little 
differences between the use the LSERK4-CPLTS technique and the classic LSERK4. 
Table 2 shows data corresponding to the tier assortment and computational times. As 
expected, the LF2-CPLTS is able of create more tiers than LF2-LTS because it only needs 
a ratio of two between maximum time step sizes. The CPU times for this simulation 
are listed for reference only and are not quite representative because the time employed 
to compute the excitation at the boundaries and the initialization is signiflcant when 
compared with the operations performed to evolve the elements. 

5.2. PEC cavity resonances 

As a second example we show comparisons of evolving a spatially uncorrelated random 
field (white noise) to study the resonances of a 1 m PEC cavity, in a similar way as done 
in [33]. The mesh used is depicted in Figure 4c with PEC boundaries at the ends 
rather than SMA. The resonance frequencies are obtained by performing the Fourier 
transform of the electric field evolution after 250 ns at a point separated 0.3m from 
one of the boundaries. Figure 8 show the eigenfrequencies obtained by the simulations 
together with the exact ones (black dashed vertical lines). The LF2 schemes don't show 
any particular difference with respect to their dispersive properties. The differences in 
amplitude between LF2 and LF2full can be attributed to the different initial treatment 
of fields. The LSERK4 schemes exhibit a similar behaviour in frequency but we observe 
additional attenuation when the CPLTS is used. When the tiers are assorted using the 
maximum stage criteria the attenuation is reduced. No late time stabilities were observed 
in any of the simulations. Table 3 shows data corresponding to the tier assortment and 
computational times. The CPU times show a clear improvement with the LSERK4- 
CPLTS algorithm while the gains for the LF2-LTS are more moderate. LF2-CPLTS 
does not perform better than the LF2. 



(a) Single Interface 15:1 ratio. 




(b) Single Interface 75:1 ratio. 




(c) Slab 7.5:1 ratio. 




(d) Boundary conditions. 
Figure 4: Meshes used for the study numerical reflections by an inhomogeneous mesh. 
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Numerical reflection 



Numerical reflection 





(a) LF2 (b) LSERK4 

Figure 5: Numerical reflection from a single interface with ratio of 15:1 



Numerical reflection 



Numerical reflection 





(c) LF2 (d) LSERK4 

Figure 6: Numerical reflection from a single interface with ratio of 75:1 



Numerical reflection 



Numerical reflectioi 





- Iserk4 

- Iserk4-CPLTS 

- lserk4-CPLTS-maxSta(!e ::; 



(a) LF2 (b) LSERK4 

Figure 7: Numerical reflection from a slab with ratio of 7.5:1. 
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Numerical resonances 
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Frequency [Hz] 



(a) LF2 
Numerical resonances 




Frequency [Hz] 
(b) LSERK4 

Figure 8: Resonances in a 1 m PEC cavity with slab meshing. Vertical dashed lines 
represent exact eigenfrequencies. 



12 





Integrator 




Number of Elements 


At™ ps] 


CPU [s] 


Tier 





1 


2 


3 4 5 6 







^ 


LSERK4-CPLTS 


120 


312 


- 


- 


0.624 


226 


^ 


LSERK4-CPLTS-mS 


120 


24 


288 


- 


0.624 


485 


^ 


LSERK4 


432 


- 


- 


. 


0.624 


468 




LF2-LTS 


120 


8 


304 


. 


0.281 


78 


1—^ 

or 


LF2 


432 


- 


- 


. 


0.281 


211 


1 

CO 


LF2full-CPLTS 


80 


48 


12 


292 - 


0.281 


173 


a 


LF2full 


432 


- 


- 


- 


0.281 


1799 


^ 


LSERK4-LTS 


600 


24 


288 


. 


0.12 


3148 


^ 


LSERK4-CPLTS-mS 


600 


24 


24 


264 - 


0.12 


15097 


^ 


LSERK4 


912 


- 


- 


- 


0.12 


4700 


tp 


LF2-LTS 


600 


8 


12 


292 - 


0.06 


1444 


or 


LF2 


912 


- 


- 


. 


0.06 


2296 


1 

1— 1 


LF2full-CPLTS 


400 


208 


12 


8 12 184 88 


0.06 


3524 


a 


LF2full 


912 


- 


- 


- 


0.06 


7211 


^ 


LSERK4-CPLTS 


240 


288 


- 


. 


1.24 


190 


^ 


LSERK4-CPLTS-mS 


240 


288 


- 


. 


1.24 


342 


^ 


LSERK4 


528 


- 


- 


. 


1.24 


325 


9 


LF2-LTS 


240 


288 


- 


- 


0.55 


158 


en 


LF2 


528 


- 


- 


. 


0.55 


151 


1 


LF2full-CPLTS 


160 


96 


272 


- 


0.55 


157 


a' 


LF2full 


528 


- 


- 


- 


0.55 


254 



Table 2: Element Tier assorting for LTS in the plane wave reflection. 



5.3. RCS Analysis of a PEC Sphere 

As a last test case we present a bi-static Radar Cross Section (RCS) analysis [18]. 
Figure 9 show the boundary conditions used. Symmetry conditions were used to reduce 
the computational domain and the 1 m radius sphere was modelled using a PEC boundary 
condition. SMA boundary conditions were used to terminate the domain 3 m away from 
the surface of the sphere. The illumination was done using a Total Field/ Scattered Field 
boundary condition in a spherical surface located 1 m away from the sphere using a 
Gaussian wave with 1 ns spread, ^/-polarization and propagating along the x axis. The 
typical element size of the mesh was 25 cm everywhere except in the PEC spherical 
surface modelling the sphere in which was set to 5 cm. 

Figure 10 shows the results of the analysis for the various LF2 and LSERK4 schemes 
under study. At 450 MHz we see that the LF2 methods fit the Mie's analytical solution 
but the LF2 using Montseny's approach exhibits an angular offset caused by an appre- 
ciable difference in the dispersion relation. At 600MHz all methods present a higher 
deviation, caused by a poorer resolution of the spatial grid. 

The LSERK4 results exhibit a better behaviour than the LF2, capturing the main 
features of the analytical solution. The application of CPLTS seems to better preserve 
the dispersion relation an thus the position of the peaks. However, at 600 MHz we can 
observe an appreciable numerical dissipation being introduced. 

Table 3 shows data corresponding to the tier assortment and computational times. 
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Integrator 




Number of Elements 


At"" [ps 


CPU s 


Tier 





1 


2 


3 4 









LSERK4-CPLTS 


240 


288 


- 


- 


1.24 


2403 


i 


LSERK4-CPLTS-mS 


240 


288 


- 


- 


1.24 


4051 


p 


LSERK4 


528 


- 


- 


- 


1.24 


4013 




LF2-LTS 


240 


288 


- 


- 


0.55 


1207 


or 
1 
CO 


LF2 


528 


- 


- 


- 


0.55 


1917 


^ 


LF2full-CPLTS 


160 


96 


272 


- 


0.55 


2381 


cr 


LF2full 


528 


- 


- 


- 


0.55 


3615 




LSERK4-CPLTS 


4535 


57279 


157 


- 


2.1 


3733 




LSERK4 


61971 


- 


- 


- 


2.1 


8613 


CA3 


LF2-LTS 


522 


8614 


52798 


37 


0.95 


963 


1—^ 


LF2 


61971 


- 


- 


- 


0.95 


4348 


LF2full-CPLTS 


114 


2155 


7411 


34521 17770 


0.95 


1851 




LF2full 


61971 


- 


- 


- 


0.95 


8642 



Table 3: Element Tier assorting for LTS in the resonant cavity and RCS problems. 



In this case, the LSERK4-CPLTS is able to provide a considerable speed up, reducing 
the CPU time from 8613 to 3733 s (- 2). The LF2 LTS techniques yield a speed-up of 
about four times the non-LTS counterparts. The CPLTS speeds up the classic LF2 by a 
factor about two. 



6. Tier assortment 



In practice, an automated meshing process may produce a quite random tier assort- 
ment having an important impact in performance and accuracy. This occurs because we 
let the LTS algorithm and the tier-assortment to span the entire mesh. Notice that in 
practice this may not be necessary an optimal approach. Figures 11, 13, 14, 15, 16 and 
17 illustrate this phenomenon. For the 1 m PEC sphere (Fig. 11, 13, 14), after imposing 
a constraint in the element size of 5 cm and leaving the rest with 25 cm we observe that 
there is an appreciable amount of scattered elements in the mesh belonging to a lower 
tier. The meshing algorithm is able to respect the sizes imposed to the elements in the 
regions closer to the surfaces but not in the inner part. Figures 15, 16 and 17 represent a 
variation of the Im PEC sphere case in which an small cylinder representing a small scale 
feature has been appended to the sphere. In this example we observe that the presence 
of scattered lower tiers happens also in problems exhibiting disparate scales, unless the 
user pre-sets a given maximum number of tiers. 

For the LSERK4 algorithm, scattered lower tiers degrade performance because, as 
depicted in Figure 12, many elements in the neighbourhood of lower tiers have to perform 
additional operations. Additionally, the CPLTS technique requires the storage of the 
elements in the neighbourhood of smaller tiers, increasing the memory consumption. 
Often the meshing and tier assorting processes result in the highest tier having a very 
small amount of elements (see Table 3), so it is up to the user whether to preserve those 
tiers or not. In the LF2-CPLTS case, we observe in Figures 13 and 16 that the assorting 
is able to create more tiers than in the LF2-LTS case. This has a positive impact in 

14 




(a) PEC (red) and PMC (green). SMA is 
not depicted. 




(b) Total Field region 
Figure 9: Boundary conditions for the RCS case. 



performance, which is specially relevant in cases with disparate spatial scales such as the 
presented in Figure 16. 

7. Conclusions 

In this work, we have introduced the Causal-Path concept as a way to perform 
LTS on explicit marching-on-time algorithms. We have applied this concept to the DG 
discretization under two different time integration techniques: LSERK4 and LF2. 

When applied to LSERK4, for all the examples studied, the CPLTS technique has 
improved the performance by a factor of about two. The dispersive properties of the 
scheme are not affected while some dissipation is introduced. 

For LF2 the performance is also improved by a factor of about two for a bi-static 
RCS analysis case. In contrast, the commonly used Montseny's technique provides an 
speed up of about four. The CPLTS technique however seems to present better disper- 
sive properties than the Montesny's approach and has better adaptivity to multiscale 
problems. 
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